1 Analysis Cohort

In this analysis, there are 150 subjects and 1205 observations. Mean (range) of follow-up time per patient was 1 (0, 2) years. Mean (range) number of observations was 8 (1, 29). The same number of patients and observations were utilized in each model below. We assume for each model that the data are missing at random, except for the joint longitudinal-survival model. The dependent variable is GLI FEV1 (% predicted) named “y” in the output below. The time scale is patients’ age at visit. Age is expressed in years. Age at CF diagnosis (in years) was included as a covariate in each model. Below are summary statistics for each variable used in the model. Once a patient was recorded as having a lung transplant, his or her data were censored from the model.

##        y               Age        Age_Diag_per_patient
##  Min.   : 10.60   Min.   : 6.01   Min.   : 0.000      
##  1st Qu.: 43.57   1st Qu.:10.94   1st Qu.: 0.100      
##  Median : 73.99   Median :18.67   Median : 0.300      
##  Mean   : 70.57   Mean   :19.47   Mean   : 2.004      
##  3rd Qu.: 94.66   3rd Qu.:24.54   3rd Qu.: 1.700      
##  Max.   :131.21   Max.   :52.12   Max.   :17.500

The Kaplan-Meier (KM) survival curve for those who died or received a lung transplant is plotted below showing the survival probability over age at visit (top graphic) and the number still alive/without lung transplant (bottom graphic).

Here we plot the frequency of subjects who died or received lung transplant over age from 6 to 30 years old. The event rate is around 0.12 or 12%. As age increases, the frequency of death/lung transplant increases. Since there are more subjects who dropped out due to death or receiving a lung transplant, there is the potential for a survivor bias, which we investigate in a later section.

In the following report, we investigate three commonly used structures in CF FEV1 analysis. These include the linear mixed-effects model (LME), Generalized Estimating Equations (GEE) and joint modelling (JM). For the LME models, we include different combinations of random effects, splines and correlation structures as used in published CF studies (Section 2). We examine published GEE models combined with additional features for linearity/non-linearity (Section 3). We fit a joint model with the linear/non-linear structure from the LME models in Section 2.

2 Linear Mixed-Effects Model (LME)

2.1 LME Model with Random Intercept

2.1.1 Assumptions

Average evolution: A linear evolution is assumed over time.

Patient-specific evolution: A linear evolution is assumed over time. We assume that each patient may start from a different time point but all patients have the same evolution over time.

2.1.2 Model Estimation

## [1] "AIC =  9292"
## [1] "BIC =  9318"
Variance Components
StdDev
(Intercept) 22.589837
Residual 9.014941
Value Std.Error DF t-value p-value
(Intercept) 104.044 3.939 1054 26.410 0.000
Age -1.589 0.184 1054 -8.620 0.000
Age_Diag 0.789 0.525 148 1.504 0.135

As the coefficient estimate for ‘Age’ is around -1.589 % predicted/year, this means for one year increase in age, FEV1 is decreasing by 1.589 (unit % predicted).

2.1.3 Model Inference

We can calculate the population-level evolution (left) and rate of change (right) over time. As we have assumed a linear evolution, the left figure shows a straight line, corresponding to a linear estimated fit. The rate of change is calculated by taking first derivative of the model equation over time. This model estimates the rate of change in FEV1 as constant over different ages as -1.589.

2.1.4 Model Diagnostics

We examined standardized residuals in terms of randomness, spread and nature of distribution.

  • The residuals vs fitted plot (upper left): XXXXX.
  • The residuals vs age (upper right): XXXXX.
  • The quantile-quantile (QQ) plot (lower left): XXXXX.
  • The density of residuals (lower right): XXXXX..

2.2 LME Model with Random Intercept and Slope

2.2.1 Assumptions

Average evolution: A linear evolution is assumed over time.

Patient-specific evolution: A linear evolution is assumed over time. We assume that each patient may start from a different time point but all patients have different evolution over time.

2.2.2 Model Estimation

## [1] "AIC =  9287"
## [1] "BIC =  9323"
Variance Components
StdDev
(Intercept) 14.199824
Age 0.589098
Residual 8.990464
Value Std.Error DF t-value p-value
(Intercept) 105.611 3.578 1054 29.516 0.000
Age -1.677 0.200 1054 -8.386 0.000
Age_Diag 0.770 0.570 148 1.351 0.179

As the estimation of coefficient for ‘Age’ is around -1.677 % predicted/year, this means for one year increase in age, FEV1 is decreasing by 1.677 (unit % predicted).

2.2.3 Model Inference

2.2.4 Model Diagnostics

2.3 LME Model with 5-knot Natural Spline

2.3.1 Assumptions

Average evolution: A non-linear evolution (assuming cubic natural splines) is assumed over time.

Patient-specific evolution: A non-linear evolution (assuming cubic natural splines) is assumed over time. A diagonal variance-covariance matrix for the random effects are assumed. This means that the random effects are not correlated.

The knots, listed below, were chosen according to a recursive algorithm (reference: Spline Regression with Automatic Knot Selection, Vivien Goepp (2018)). There are 6 knots in total selected for the entire dataset (age: 6 years and older).

## [1] "Knots location (at age): 9.8, 13.5, 17.3, 21.4, 31.3"

2.3.2 Model Estimation

## [1] "AIC =  9237"
## [1] "BIC =  9319"
Variance Components
StdDev
(Intercept) 16.218884
ns1 20.607789
ns2 17.685630
ns3 29.868936
ns4 31.471322
ns5 13.378872
ns6 66.527389
Residual 8.436234
Value Std.Error DF t-value p-value
(Intercept) 99.142 3.815 1049 25.985 0.000
ns1 -13.274 6.205 1049 -2.139 0.033
ns2 -19.625 6.127 1049 -3.203 0.001
ns3 -41.742 6.972 1049 -5.987 0.000
ns4 -27.559 18.199 1049 -1.514 0.130
ns5 -127.423 30.227 1049 -4.215 0.000
ns6 -171.810 63.189 1049 -2.719 0.007
Age_Diag 0.885 0.571 148 1.551 0.123

2.3.3 Model Inference

2.3.4 Model Diagnostics

2.4 Linear Random Intercepts with Exponential Correlation

2.4.1 Assumptions

Average evolution: A linear mean evolution is assumed over time.

Patient-specific evolution: A linear evolution is assumed over time. We assume that each patient may start from a different time point but all patients have the same evolution over time. Correlation structure: exponential correlation function. This means that, for each subject, the correlation of FEV1 exponentially decays to zero with the distance between times (ages).

2.4.2 Model Estimation

## [1] "AIC =  9216"
## [1] "BIC =  9252"
Variance Components
StdDev
(Intercept) 0.04146124
Residual 24.35877545
## Correlation Structure: Exponential spatial correlation
##  Formula: ~Age | eDWID 
##  Parameter estimate(s):
##      range     nugget 
## 19.6114439  0.1043369
Value Std.Error DF t-value p-value
(Intercept) 103.294 4.047 1054 25.525 0.000
Age -1.539 0.191 1054 -8.041 0.000
Age_Diag 0.736 0.528 148 1.395 0.165

As the estimation of the coefficient for ‘Age’ is around -1.539 % predicted/year, this means for one year increase in age, FEV1 is decreasing by 1.539 (unit % predicted).

2.4.3 Model Inference

2.4.4 Model Diagnostics

2.5 Spline Random Intercept with Exponential Correlation

2.5.1 Assumptions

Average evolution: A non-linear evolution (assuming cubic natural splines) is assumed over time.

Patient-specific evolution: A linear evolution is assumed over time and each patient may start at different time point

Correlation structure: exponential correlation function. This means that, for each subject, the correlation of FEV1 exponentially decays to zero with the distance between times (age).

2.5.2 Model Estimation

## [1] "Knots location (at age): 9.8, 13.5, 17.3, 21.4, 31.3"
## [1] "AIC =  9219"
## [1] "BIC =  9280"
Variance Components
StdDev
(Intercept) 0.05198012
Residual 24.10691172
## Correlation Structure: Exponential spatial correlation
##  Formula: ~Age | eDWID 
##  Parameter estimate(s):
##      range     nugget 
## 19.8375331  0.1067286
Value Std.Error DF t-value p-value
(Intercept) 99.410 4.917 1049 20.217 0.000
ns1 -12.826 7.291 1049 -1.759 0.079
ns2 -18.860 7.274 1049 -2.593 0.010
ns3 -42.782 7.441 1049 -5.749 0.000
ns4 -26.507 18.945 1049 -1.399 0.162
ns5 -119.501 30.896 1049 -3.868 0.000
ns6 -157.231 64.642 1049 -2.432 0.015
Age_Diag 0.672 0.531 148 1.267 0.207

2.5.3 Model Inference

2.5.4 Model Diagnostics

3 Linear Generalized Estimating Equations Model (GEE)

Compared with LME models, GEE models allow for weaker distributional assumptions and are more robust to covariance misspecification. In this analysis, the average evolution (fixed-effects) structure in the GEE model was selected based on the performance of all LME models. This GEE model is fitted with a linear structure to model the mean evolution and accounts for longitudinal correlation using the sandwich estimator.

3.1 Assumptions

Average evolution: A linear evolution (assuming natural cubic splines) is used over time.

Patient-specific evolution: A linear evolution is assumed over time and each patient may start at different time point.

Correlation structure: independence (working) correlation function.

3.2 Model Estimation

Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 100.250 4.208 567.622 0.000
Age -1.596 0.176 82.669 0.000
Age_Diag 0.810 0.599 1.830 0.176
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 599.319 55.692

3.3 Model Inference

3.4 Model Diagnostics

4 Non-linear Generalized Estimating Equations Model (GEE)

In this analysis, the GEE mode is fitted with a spline structure to model the mean evolution and includes an independence correlation.

4.1 Assumptions

Average evolution: A non-linear evolution (assuming cubic natural splines) is used over time.

Patient-specific evolution: A linear evolution is assumed over time and each patient may start at different time point.

Correlation structure: independence correlation function (sandwich estimator used to account for longitudinal correlation).

4.2 Model Estimation

Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 91.492 8.997 103.401 0.000
ns1 -3.853 13.751 0.078 0.779
ns2 -15.036 13.746 1.196 0.274
ns3 -45.308 11.940 14.399 0.000
ns4 -17.317 23.482 0.544 0.461
ns5 -108.001 33.368 10.476 0.001
ns6 -168.365 64.493 6.815 0.009
Age_Diag 0.660 0.530 1.549 0.213
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 569.361 56.561

4.3 Model Inference

4.4 Model Diagnostics

5 Linear Joint Model of Longitudinal FEV1 (% predicted) and Survival (JM)

Missing data due to death or lung transplant is accounted for by jointly modeling the FEV1 process with the survival process using a Weibull model for survival. We use a Weibull baseline risk function to fit the baseline hazard in the survival sub-model. This joint model was implemented using the shared parameter framework. The shared parameters here corresponded to connecting the survival outcome with the underlying value of FEV1 and rate of change in FEV1 over time.

5.1 Assumptions

We assume that FEV1 and survival are associated over time (age). To fit the longitudinal sub-model for FEV1, we apply the LME model with linear assumption; for survival, we use the Weibull survival model.

Average evolution of FEV1: A linear evolution is assumed over time.

Patient-specific evolution of FEV1: A linear evolution is assumed over time and each patient may start at different time point (and have the same evolution).

Survival Baseline hazard: A Weibull baseline risk hazard is assumed over time at baseline.

Association: We assume the underlying value of FEV1 and rate of change over time to be associated with survival.

5.2 Model Estimation and Interpretation

## [1] "AIC =  9231"
## [1] "BIC =  9267"

The variance components for FEV1 include the standard deviations of the random intercept as “(Intercept)”, slope as “Age” and residual as “Residual” reported below.

Variance Components
StdDev
(Intercept) 12.429
Age 0.496
Residual 8.696
Longitudinal Process
Value Std.Err z-value p-value
(Intercept) 106.202 0.605 175.524 0
Age -1.688 0.028 -59.618 0
Age_Diag 0.697 0.081 8.635 0
Event Process
Value p-value Hazard Ratio Lower Upper
(Intercept) -81.815 0.017 0.000 0.000 0.000
Age_Diag -0.273 0.012 0.761 0.615 0.943
Assoct 0.153 0.142 1.165 0.950 1.428
Assoct.s -11.658 0.023 0.000 0.000 0.198
log(shape) 2.698 0.000 14.850 6.883 32.038

5.3 Model Inference

5.4 Model Diagnostics

## Warning in rm(list = ".Random.seed", envir = globalenv()): object
## '.Random.seed' not found

6 Non-linear Joint Model of Longitudinal FEV1(% predicted) and Survival (JM)

6.1 Assumptions

Average evolution of FEV1: A non-linear evolution by cubic natural spline is assumed over time.

Patient-specific evolution of FEV1: A linear evolution is assumed over time and each patient may start at different time point (and have the same evolution).

Survival Baseline hazard: A Weibull baseline risk hazard is assumed over time at baseline.

Association: We assume the underlying value of FEV1 and rate of change over time to be associated with survival.

6.2 Model Estimation and Interpretation

## [1] "AIC =  9435"
## [1] "BIC =  9486"
Variance Components
StdDev
(Intercept) 14.060
Age 0.652
Residual 8.953
Longitudinal Process
Value Std.Err z-value p-value
(Intercept) 99.181 3.616 27.431 0.000
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))1 -15.190 5.064 -3.000 0.003
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))2 -12.979 5.194 -2.499 0.012
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))3 -43.076 5.898 -7.304 0.000
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))4 -28.549 8.422 -3.390 0.001
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))5 -79.941 7.905 -10.112 0.000
ns(Age, knots = c(9.8, 13.5, 17.3, 21.4, 31.3))6 -81.521 4.657 -17.504 0.000
Age_Diag 0.749 0.538 1.394 0.163
Event Process
Value p-value Hazard Ratio Lower Upper
(Intercept) -5.094 0.264 0.006 0.000 47.029
Age_Diag -0.132 0.132 0.876 0.738 1.041
Assoct -0.096 0.000 0.909 0.871 0.947
Assoct.s 0.626 0.065 1.870 0.961 3.637
log(shape) 1.039 0.011 2.826 1.270 6.289

6.3 Model Inference

6.4 Model Diagnostics

## Warning in rm(list = ".Random.seed", envir = globalenv()): object
## '.Random.seed' not found

7 Plot Population-level FEV1 (% predicted) Estimation for All Models

Model Name Abbreviation
Name Specification
Int Linear evolution with random intercept
Int + Slope Linear evolution with random intercept and slope
Spline Non-linear evolution by using spline as both average evolution and random effect
Int + Corr Linear evolution with random intercept and exponential correlation structure
Spline + Corr Non-linear evolution with spline as average evolution, random intercept and exponential correlation structure
Lin_GEE Linear evolution with independence correlation structure
Nonlin_GEE Non-linear evolution by using spline with independence correlation structure
Lin_JM Linear evolution with jointly model FEV and survival
Nonlin_JM Non-linear evolution with jointly model FEV and survival

To check more closely, we zoom in the result for age 6 to 30.